Overview

Brought to you by YData

Dataset statistics

Number of variables3
Number of observations27299925
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.9 GiB
Average record size in memory76.0 B

Variable types

Numeric2
Categorical1

Alerts

MONTHS_BALANCE has 610965 (2.2%) zeros Zeros

Reproduction

Analysis started2025-09-06 09:41:30.955457
Analysis finished2025-09-06 09:44:06.905299
Duration2 minutes and 35.95 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

SK_ID_BUREAU
Real number (ℝ)

Distinct817395
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6036297.3
Minimum5001709
Maximum6842888
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size208.3 MiB
2025-09-06T12:44:07.674241image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum5001709
5-th percentile5113173
Q15730933
median6070821
Q36431951
95-th percentile6759761
Maximum6842888
Range1841179
Interquartile range (IQR)701018

Descriptive statistics

Standard deviation492348.86
Coefficient of variation (CV)0.081564713
Kurtosis-0.73796627
Mean6036297.3
Median Absolute Deviation (MAD)353720
Skewness-0.37218781
Sum1.6479046 × 1014
Variance2.424074 × 1011
MonotonicityNot monotonic
2025-09-06T12:44:07.878694image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5645521 97
 
< 0.1%
6733619 97
 
< 0.1%
6176606 97
 
< 0.1%
6321834 97
 
< 0.1%
6356432 97
 
< 0.1%
6356400 97
 
< 0.1%
6243196 97
 
< 0.1%
6356352 97
 
< 0.1%
6356351 97
 
< 0.1%
6765607 97
 
< 0.1%
Other values (817385) 27298955
> 99.9%
ValueCountFrequency (%)
5001709 97
< 0.1%
5001710 83
< 0.1%
5001711 4
 
< 0.1%
5001712 19
 
< 0.1%
5001713 22
 
< 0.1%
5001714 15
 
< 0.1%
5001715 60
< 0.1%
5001716 86
< 0.1%
5001717 22
 
< 0.1%
5001718 39
< 0.1%
ValueCountFrequency (%)
6842888 62
< 0.1%
6842887 37
< 0.1%
6842886 33
< 0.1%
6842885 24
 
< 0.1%
6842884 48
< 0.1%
6842883 37
< 0.1%
6842882 8
 
< 0.1%
6842881 32
< 0.1%
6842880 58
< 0.1%
6842879 39
< 0.1%

MONTHS_BALANCE
Real number (ℝ)

Zeros 

Distinct97
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-30.741687
Minimum-96
Maximum0
Zeros610965
Zeros (%)2.2%
Negative26688960
Negative (%)97.8%
Memory size208.3 MiB
2025-09-06T12:44:08.127032image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum-96
5-th percentile-79
Q1-46
median-25
Q3-11
95-th percentile-2
Maximum0
Range96
Interquartile range (IQR)35

Descriptive statistics

Standard deviation23.864509
Coefficient of variation (CV)-0.77629147
Kurtosis-0.31614292
Mean-30.741687
Median Absolute Deviation (MAD)16
Skewness-0.76068962
Sum-8.3924574 × 108
Variance569.51479
MonotonicityNot monotonic
2025-09-06T12:44:08.374370image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1 622601
 
2.3%
-2 619243
 
2.3%
-3 615080
 
2.3%
0 610965
 
2.2%
-4 609138
 
2.2%
-5 602663
 
2.2%
-6 594277
 
2.2%
-7 583794
 
2.1%
-8 573566
 
2.1%
-9 563804
 
2.1%
Other values (87) 21304794
78.0%
ValueCountFrequency (%)
-96 43147
0.2%
-95 46542
0.2%
-94 49965
0.2%
-93 53535
0.2%
-92 57300
0.2%
-91 61144
0.2%
-90 65188
0.2%
-89 69383
0.3%
-88 73452
0.3%
-87 77586
0.3%
ValueCountFrequency (%)
0 610965
2.2%
-1 622601
2.3%
-2 619243
2.3%
-3 615080
2.3%
-4 609138
2.2%
-5 602663
2.2%
-6 594277
2.2%
-7 583794
2.1%
-8 573566
2.1%
-9 563804
2.1%

STATUS
Categorical

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 GiB
C
13646993 
0
7499507 
X
5810482 
1
 
242347
5
 
62406
Other values (3)
 
38190

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters27299925
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC
2nd rowC
3rd rowC
4th rowC
5th rowC

Common Values

ValueCountFrequency (%)
C 13646993
50.0%
0 7499507
27.5%
X 5810482
21.3%
1 242347
 
0.9%
5 62406
 
0.2%
2 23419
 
0.1%
3 8924
 
< 0.1%
4 5847
 
< 0.1%

Length

2025-09-06T12:44:08.565858image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-09-06T12:44:08.782280image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
c 13646993
50.0%
0 7499507
27.5%
x 5810482
21.3%
1 242347
 
0.9%
5 62406
 
0.2%
2 23419
 
0.1%
3 8924
 
< 0.1%
4 5847
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
C 13646993
50.0%
0 7499507
27.5%
X 5810482
21.3%
1 242347
 
0.9%
5 62406
 
0.2%
2 23419
 
0.1%
3 8924
 
< 0.1%
4 5847
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 27299925
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 13646993
50.0%
0 7499507
27.5%
X 5810482
21.3%
1 242347
 
0.9%
5 62406
 
0.2%
2 23419
 
0.1%
3 8924
 
< 0.1%
4 5847
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 27299925
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 13646993
50.0%
0 7499507
27.5%
X 5810482
21.3%
1 242347
 
0.9%
5 62406
 
0.2%
2 23419
 
0.1%
3 8924
 
< 0.1%
4 5847
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 27299925
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 13646993
50.0%
0 7499507
27.5%
X 5810482
21.3%
1 242347
 
0.9%
5 62406
 
0.2%
2 23419
 
0.1%
3 8924
 
< 0.1%
4 5847
 
< 0.1%

Interactions

2025-09-06T12:43:28.710101image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-09-06T12:42:53.914100image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-09-06T12:43:35.768236image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-09-06T12:43:21.407618image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Correlations

2025-09-06T12:44:08.949832image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
MONTHS_BALANCESK_ID_BUREAUSTATUS
MONTHS_BALANCE1.0000.0100.046
SK_ID_BUREAU0.0101.0000.009
STATUS0.0460.0091.000

Missing values

2025-09-06T12:43:37.177471image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
A simple visualization of nullity by column.
2025-09-06T12:43:42.951039image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

SK_ID_BUREAUMONTHS_BALANCESTATUS
057154480C
15715448-1C
25715448-2C
35715448-3C
45715448-4C
55715448-5C
65715448-6C
75715448-7C
85715448-8C
95715448-90
SK_ID_BUREAUMONTHS_BALANCESTATUS
272999155041336-42X
272999165041336-43X
272999175041336-44X
272999185041336-45X
272999195041336-46X
272999205041336-47X
272999215041336-48X
272999225041336-49X
272999235041336-50X
272999245041336-51X